Commercial Eucalyptus plantations are a critical component of the forestry sector across South America, providing timber, pulpwood, and carbon sequestration services. Monitoring their health over time using satellite imagery is essential for detecting disturbances early, planning harvests, and assessing long-term stand productivity. However, Landsat time series data are rarely complete: cloud cover, sensor gaps, and acquisition scheduling leave substantial periods without observations.
This project applies Generalized Additive Models (GAMs) to a 20-year Landsat multispectral time series from a Eucalyptus stand in South America to characterize vegetation dynamics, fill temporal gaps in spectral index data, and distinguish seasonal variation from long-term trends. The dataset spans 2000 to 2019 and captures a striking disturbance event around 2002 to 2003 where NDVI dropped sharply from values near 0.90 to below 0.35, followed by a multi-year recovery trajectory that GAMs are well-positioned to describe.
Satellite remote sensing produces time series data that are inherently gappy. Cloud cover is the most common cause of missing observations, but sensor malfunctions, orbital gaps, and radiometric processing failures also contribute. In the dataset used here, only 189 of 395 scheduled Landsat scenes contain valid spectral reflectance values for the stand of interest, meaning roughly 52% of observations are missing.
Simple interpolation approaches such as linear interpolation or moving averages struggle with data of this kind because they cannot simultaneously account for long-term trends and within-year seasonality. A model that treats all gaps as equivalent, regardless of whether they fall in a high-growth summer period or a dormant winter period, will produce systematically biased estimates. GAMs address this limitation by explicitly modelling both the temporal trend and the seasonal cycle as separate smooth components, allowing gap-filling predictions to be ecologically informed.
The dataset contains multispectral reflectance values from the Landsat satellite mission for a single Eucalyptus stand (ID: 32041-1-5) in South America. Each observation includes surface reflectance in five Landsat bands and three derived spectral indices: NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), and EVI (Enhanced Vegetation Index). Time is represented as a continuous decimal year value (Year_Continuous) to allow use in the smooth functions of the GAM.
The chart below shows all valid observations for the three spectral indices across the 20-year study period. Use the buttons to switch between indices. The dramatic drop visible around 2002 to 2003 reflects a disturbance event followed by a gradual recovery of canopy greenness and moisture content over subsequent years.
GAMs extend linear models by replacing fixed-slope terms with flexible smooth functions,
making them well-suited for ecological time series where the relationship between time and
vegetation response is nonlinear. The smoothing effect comes from two components: the smooth
function s(), which uses spline basis functions to flexibly fit the data, and a
penalty term that controls how wiggly the fitted curve can become. The degree of smoothness
is automatically tuned using Restricted Maximum Likelihood (REML), preventing the model from
simply connecting every observed point.
Four models of increasing complexity were fitted to characterize different aspects of the vegetation signal. Each builds on the previous by adding components that capture additional sources of variation.
A GAM with a single Gaussian process smooth on the continuous year variable. Captures the long-term trend in NDVI without accounting for within-year seasonality.
A cyclic cubic spline on day-of-year, fitted only to post-2008 data where the stand has recovered. Isolates the within-year seasonal pattern of canopy greenness.
Combines a Gaussian process smooth on year with a cyclic cubic spline on day-of-year. This additive structure separates the long-term recovery trend from seasonal fluctuations.
Adds a tensor interaction term between year and day-of-year, allowing the seasonal amplitude to shift over time. Applied to NDMI to test whether moisture seasonality changed across the study period.
reg_03 <- gam( ndvi ~ s(Year, bs = "gp") + s(doy, bs = "cc", k = 20), data = subset(my_sdata, Year >= 2004), method = "REML" ) summary(reg_03)
reg_04 <- gam( ndmi ~ s(Year) + s(doy, bs = "cc") + ti(doy, Year), data = my_sdata, method = "REML", select = TRUE )
new_data <- expand.grid(Year = 2003:2020, doy = 1:365) new_data$Year_Continuous <- new_data$Year + new_data$doy / 365 pred_gam <- predict(reg_03, newdata = new_data, se.fit = TRUE)
The combined trend-plus-seasonality model (reg_03) explained 76.7% of the deviance in NDVI with an adjusted R² of 0.782, substantially outperforming the trend-only model. The Gaussian process smooth on year captured the sharp decline and multi-year recovery following the 2003 disturbance event, while the cyclic cubic spline on day-of-year captured the within-year seasonal pulse of photosynthetic activity characteristic of Eucalyptus stands.
The NDMI models showed an even better fit than NDVI, likely because moisture-sensitive spectral indices exhibit less high-frequency noise than NDVI in densely vegetated Eucalyptus canopies. The tensor interaction model for NDMI confirmed a statistically significant interaction between year and day-of-year, meaning the amplitude of seasonal moisture variation shifted over the course of the study period, an ecologically meaningful finding for plantation managers tracking drought stress.
| Model | Index | Components | Deviance Explained | Adj. R² |
|---|---|---|---|---|
| reg_01b | NDVI | Temporal trend only | ~55% | ~0.54 |
| reg_02 | NDVI | Seasonal cycle only (post-2008) | ~40% | ~0.38 |
| reg_03 | NDVI | Trend + seasonality | 76.7% | 0.782 |
| reg_03b | NDMI | Trend + seasonality | >78% | >0.78 |
| reg_04 | NDMI | Trend + seasonality + interaction | Significant interaction term | N/A |
The GAMs fitted here appear to slightly underfit in periods of high seasonal variation,
where confidence intervals widen and the model struggles to follow rapid transitions.
This is a deliberate trade-off: the smoothing penalty prevents overfitting to noise
while accepting some bias in high-variability periods. If overfitting were a concern,
reducing the number of knots k or increasing the smoothing penalty would
constrain the curve further.
One important limitation of these models is that they are entirely aspatial. They describe temporal dynamics at a single stand without accounting for spatial autocorrelation among neighboring stands. In a landscape where adjacent Eucalyptus plots share similar soil conditions, microclimates, and management histories, ignoring spatial structure may underestimate uncertainty and overstate the generalizability of the fitted trends. Two natural extensions would be spatial lag or error models, which explicitly incorporate a spatial weights matrix, or Geographically Weighted Regression, which allows model coefficients to vary across the landscape. Both approaches have been applied to satellite-derived spectral indices in forestry contexts and would be straightforward extensions of this framework.
A parametric alternative to GAMs would be a linear mixed-effects model with polynomial terms for year and day-of-year. Mixed-effects models are more interpretable since each coefficient directly describes the direction and magnitude of a predictor's effect, but they require assumptions about the functional form of the temporal trend that GAMs do not. A non-parametric alternative such as Random Forest can capture more complex nonlinear relationships without any functional assumptions, but at the cost of interpretability: understanding how a variable drives predictions requires additional tools such as SHAP values rather than simply reading off model coefficients.
The gap-filling framework demonstrated here has direct operational relevance for plantation forestry management. Most commercial Eucalyptus operations use satellite imagery to track stand health, plan fertilization and irrigation schedules, and detect early signs of pest or disease outbreaks. When cloud cover interrupts the time series during a critical growth window, managers lose the ability to make informed decisions. GAMs trained on historical data from the same stand can fill those gaps with uncertainty-quantified predictions rather than simply flagging the period as missing.
Beyond gap-filling, this approach is directly applicable to modelling leaf area index and above-ground biomass as a function of climate variables such as temperature and precipitation, helping predict how stand productivity responds to changing conditions. It is also applicable to early warning systems for drought stress, where NDMI trends can signal moisture deficits before they become visible in NDVI, giving managers an earlier window for intervention.